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ABSTRACT 



Classically, the invariant property of maximum likelihood estimators 
has been limited by one-to-one restrictions on the transformation. This 
thesis defines the Induced Likelihood Function and develops a theorem 
which may be used to extend the invariant property to estimation problems 
where the one-to-one restriction is dropped. It is shown that the theorem 
is applicable to the k dimensional estimation problem. 

Theorem; 

If l) f is a function such that S is mapped into and 
f(0) ^ f(0) for all 0 in S. 

2) (|) is a transformation such that S is mapped into S 
where (j)(0) = 0* and (|)(0) = 0* for all 0 in S. 

♦ i— 1/ ^ 

Define an inverse on S such that ^ (0^) « 0 and 

(|)~^(0 ) * 0 for all 0 in S . 

3) g is a function defined by g(0 ) = f((j)*^(0 )) such 

* 

that S is mapped into 

then g(0*) ^ g(0*) for all 0* in S*. 
o 

The writer wishes to express his appreciation to Professor P. W. 

Zehna for his guidance, assistance and the essential elements for the 
proof of the above theorem and to Professor J. R. Borsting for his sug- 
gestions and encouragement. 
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SECTIOi'j I 



INTRODUCTION 



1*1 Statement of the Problem 

The invariance property of maximum likelihood estimation provides a 
very convenient tool for statistical application. However, its use is 
somewhat limited in practical applications since the property apparently 
has only been shown to hold when the transformation of the parameter space 
is one-to-one. This investigation evolved as a follow-up to one phase of 
a reliability study undertaken by Captain W, J. Corcoran, USN, and 
Dr, H. Weingarten of the Technical Division of the Special Projects Office 
of the U. S. Navy and Dr. P, W. Zehna of CSIR (l3)(l4). This SP sponsored 
study presented several estimators for the parameters of a conceptual re- 
liability model based on the multinomial probability distribution. One 
of the proposed estimates was ”like" a maximum likelihood estimate (mle) 
in that it was a function of mle's; but since the function was not 1-1, 
the estimate was not formally called a mle. Attempts to derive distribu- 
tion information concerning this estimate involved very complicated equa- 
tions and these difficulties were compounded by the fact that under current 
definitions ^d concepts, maximum likelihood estimation (MLE) distribution 
theory was not correctly applicable. It was felt by Dr. Zehna that one 
of the primary questions that had to be answered prior to further work on 
the model was, "Does the invariance property of MLE apply when the func- 
tional relationship is not 1-1, and if so, under what conditions?" 
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1.2 Purpose of the Study 

The purpose of this study is to investigate the invariance property 
of maximum likelihood estimation when the transformation function is not 
one-to-one, and to attempt to formalize concepts, definitions, and theorems 
that are applicable in this and similar situations o 

1.3 Thesis Scope and Organization 

Of necessity, it is assumed that the reader has a basic familiarity 
with the theory of probability and statistics, and the method and proper- 
ties of maximum likelihood estimation, A brief review of some of the more 
pertinent concepts of point estimation along with a summary of the tech- 
nique of MLE is presented in chapter two to provide a minimal common back- 
ground and to assure familiarity with the notation as it is used in later 
discussions. Appendix one contains a chronological key to all notation 
used and is referenced by numbers indicating the page in the thesis on 
which the notation was originally introduced. 

In chapter three the invariance property of MLE is discussed and con- 
cepts and theorems are developed which allow the present theory to be gen- 
eralized and extended. Examples are liberally used to emphasize the points 
under discussion. Chapter four, in summary, attempts to indicate the poss- 
ible contributions of the thesis, and suggests possible areas for fiu*ther 
study. 
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SECTION II 



MAXIMUM LIKELIHOOD ESTIMATION 



2,1 Estimation; Basic Concepts 

The purpose of statistical estimation is to estimate > on the basis 
of an observed sample, the values of the unknown parameters of the popula- 
tion from which the sample originated. Since 1763 when Bayes memoirs were 
published posthumously, there has been wide controversy and discussion con- 
cerning the various estimation techniques and the properties of the result- 
ing estimates. Over the years several of these descriptive properties or 
characteristics have emerged as "desirable*' traits of estimators. After 



presenting some concepts and definitions, several of the properties usually 



related to maximum likelihood estimates are briefly discussed. 
Symbol/Term Definition 

x^, X 2 » . . • » x^ A sample or outcome of observed 

values of the random variables 

Xj^, Xj, • . . , Xjj 

0 , © 2 » • • • » parameters of an experiment - 

generally indices for some family 
of probability distributions 



parameter 



f(x;©) 



E(x) 



A constant of a probability dis- 
tribution, generally unknown in 
estimation problems 

The probability density function 
of the random variable X with para- 
meter indexed by 9, denoted pdf 

The expectation of x 



estimator A statistic; a rule for making 

an estimate of a parameter; a 
f\mction of the observed values of 
the random variables. An estima- 
tor is derived prior to sampling. 
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Svmbol/Term 



Definition 



estimate A numerical value assigned to a 

parameter of a distribution on 
the basis of evidence from samples s 
an observed value of an estimatoro 
An estimate is made after the samp- 
lingo (in this paper estimate will 
imply "statistical point estimate" 
unless otherwise indicated.) 

The distinction between the parameter and its estimate is an import- 
ant one. The true parameter value is fixed and unknowno However, with 
repetition of an experiment, the sample will vary, and the estimate itself 
will vary and will have a probability distribution. Estimation techniques 
are derived with the assumption that a sample is representative of the true 
population, therefore, the parameter estimate is subject to sampling errors. 
The possible magnitude of sampling error is an important consideration and 
leads to interval estimation which is not discussed in this paper. 

2.2 The Method of Maximum Likelihood Estimation 

The method of moments, introduced by Karl Pearson in 1894 was the earli- 
est formal technique proposed for point estimation. Since that time many 
estimation procedures have been devised, the best known of which are the 
methods of minimum chi square, Bayes, Minmax, least squares, and maximum 
likelihood. It has been said that in many respects the introduction of 
maximum likelihood estimation marked the era of modem statistical theory.^ 
The principle of maximum likelihood was discussed by Gauss prior to 1880, 
but R. A. Fisher formally developed maximum likelihood estimation (MLE) as 
a technique in a series of papers, the first of which was presented in 
1921 (20). 

A. Fraser, Statistics, an Introduction, John Wiley and Sons, Inc., 
p. 224, 1958 
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Gauss had stated the concept in the following manner* Assume a random var- 
iable (vector) X with real values (x^, « * * x^) where the pdf of X is a 
function of the parameter(s) indexed by ©. Let Q have a known or assumed 
prior distribution with range a to b. Then the posteriori distribution of 
0 given X = X is 

f(e,x) 

f(eU) « — = o(x)f(e,x)o 

/ f(e,x)d« 

-'a 

Gauss used the mode of the derived posteriori distribution as an estimate 

of 0* This value is what is commonly known today as the maximum likelihood 

2 

estimate, the value of 0 which maximizes the pdf of X with respect to 0* 

Fisher in his development derived what became known as the "Likelihood 
Function", the product of the population densities for each value in the 
sample. This function is denoted L(0) where L(0) = 7T f(i^;0) and is re- 
garded as a function of 0 for fixed x^* The method of maximum likelihood 
estimation is defined by maximizing this function* Since the logarithim 
is a monotonic increasing function, L(0) and its log are maximized by the 
same value of 0* This is sometimes convenient since manipulation of log 
L(0) is often much easier than working with the function directly* 

The procedure for determining the mle of 0 is as follows; 

1) Determine the pdf, f(x;0) 

2) Determine L(0) = Tf f(x^;0) and express as log L(0) if appropriate* 

3) Determine a value of 0 which will maximize L(0)* This value 
is usually found by setting the derivative(s) of the likelihood function 
with respect to 0 equal to zero and solving the ensuing equation(s) for the 

2 

E* L* Lehmann, Notes on the Theory of Estimation, University of Cali- 
fornia, p* 1-9 » 1950 
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parameter value(s) when conditions exist that make this possibleo If L(©) 

is differentiable and has its maximum at an interior point of the range of 

9, the point at which L(Q) attains this maximum is the mle of 9p denoted 

and the "Likelihood Equation" is log L(q) yy ” 

L J 

Setting the derivative of a function equal to zero and solving in terms 
of a parameter does not in itself guarantee a maximizing value o If there 
is any doubt as to the authenticity of the solution p there should be further 
investigation to verify the underlying assumption p namely that the likeli- 
hood equations generally have only maximinizing solutions « Lindgren points 
out that this is usually the case since L(9) is a product of probability 

3 

densities and is usually bounded above and continuous in 9o 
2o3 Desirable Properties of Estimators 

There are many ways that an estimator may be chosen o Hopefully sta- 
tistical techniques provide the tools for choosing "good" estimators » To 
help describe what is meant by "good", several generally desirable proper- 
ties or cheiracteristics of estimators have been definedo The properties 
usually associated with mle*s are discussed belowo 

l) Unbiasedness; This property is concerned with the distribu- 
tion of the estimatoro An estimator 9(x^p o o o p x^) for the parameter 
9 is s€dd to be unbiased if B(9) = 9« Thenp the bias of 9p denoted bp is 
b = E(9) - 9o Although unbiasedness is a desirable trait, it is by no means 
paramount o Figure 1 shows the densities of three estimators of 9o Althou^ 

/N /\ A. 

9j^ and 9^ are both unbiased, 9^ is obviously the best estimator of the three 
even though it has positive or ri^t biaSo It is apparent that unbiasedness 

S, Lindgren, Statistical Theory, the Macmillian Company, po 222, 

1960o 
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considered alone does not guarantee a good estimatoro The distributionj 
variance, and sample size all modify the bias,, 

2) Consistency: Consistency is a large sample property of an 

estimator. An estimator is said to be consistent if its probability dis- 
tribution concentrates on the true parameter value as the sample size be- 
comes infinite. That is, © is consistent ifP(|©-§|<c5) = 1 as nrH^oo 
for every 6> Oo 




Figure 1, Density Functions of Three Estimators of the Parameter Q 



An unbiased estimator is consistent if its variance approaches zero as the 

5 

sample size approaches infinity. 

There may be many consistent estimators of a parameter. Therefore, as 
with unbiasedness, the criterion of consistency alone does not guarantee a 
useful estimator, althou^ consistency is usually a desirable property, 

3) Efficiency: Efficiency provides a criterion for comparing 

unbiased estimates of a parameter. As mentioned previously, once it is 

4 

A, M, Mood, Introduction to the Theory of Statistics, McGraw-Hill 
Book Company, Inc,, p, 149i 1950, 

5 

H. Cramer, Mathematical Methods of Statistics, Princeton University 
Press, p, 351* 1946 
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known that the distribution of the estimator is centered on the true value 
of the parameter 9 the variance of the distribution becomes an important 
consideration « If 9^ and ©2 estimates of ©p the "Relative Efficiency" 
of ©j^ to ©2 is (A 2 /a^) X 100^ where A^ = E(©^ - ©)^o If the ratio A 2 /a^ 

/s 

is greater than one, ©^^ may be considered a more efficient, and therefore 

A* Afc 

perhaps a more suitable estimate than Q^o If ©^ and ©2 are unbiased esti« 
mates of ©, then A 2 /a^ is a ratio of variances and will take on its highest 
values when ©^ is an estimate with minimum variance « Ro A» Fisher proposed 
that the estimator having a minimum variance in large samples should be 
called "Efficient"* This idea was formalized by a definition very similar 
to the following; 

/s 

Definition: © is said to be an efficient estimator of © if: 

1 ) - ©) approaches N(0,G^^) as N approaches infinity* 

2 ) for any other estimator O' for which - ©) approach* 

es N(0,CP‘^ ) , <T^ 2: (The efficiency of © is 

(<r^ /cr^*) X 100 ^,) 

The Cramer-Rao inequality^ may be used to find the limiting value of 

mean square deviations (variances for unbiased estimators)* Efficient es* 

7 

timators are consistent but are not necessarily unbiased except in the limit* 
4) Sufficiency: An estimator is sufficient if, "it contains all 

0 

the information in the sample regarding the parameter", that is, it utilizes 
all of the pertinent information in the sample* 

®ibid, p. 477 

7 

A* Mo Mood, Introduction to the Theory of Statistics, McGraw*Hill 
Book Company, Inc*, p* 151» 1950 

®ibid 
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Definition; 9 is a sufficient estimator of © if, given the 

A, . 

value of ©(x^, o o « , x^), the conditional distribution 
is independent of the parameter ©o 

In many situations the evaluation and manipulation of conditional dis- 
tributions is very difficult, however, the following criterion allows de- 
termination of sufficiency by discerning if the joint density function can 
be properly factored. 

Theorem 2.1; An estimator is sufficient if and only if the pro- 
bability density function can be factored into two functions 
g and h, where h is dependent on the estimator and the parame- 
ter and g is independent of the parameter. That is, ^ is suf- 
ficient if 77"f(x^, ©) = g(x^, o o o , x^|§) h(©j©)o 

If sufficient statistics exist, it has been shown that they will be solu- 

9 

tions of maximum likelihood. 

5) Invariance; This property, which is to be discussed at length 
in chapter three is usually associated with maximum likelihood estimation. 

The property implies that if the mle of © is © and certain regularity condi- 
tions are satisfied a mle of (()(©) is (|)(©)o That is, a mle of a function of 
© is simply the function with the value of © substituted for ©. 

Maximum likelihood estimates are usually biased, consistent, efficient, 
invariant, and a function of a sufficient statistic if one exists. Under 

A 

fairly general regularity conditions © is asymptotically normally distri- 
buted, has finite variance with limiting value = l/l(©) where 

9 

R. A. Fisher, Contributions to Mathematical Statistics, John Wiley 
and Sons, Inc., p. 224, 1958 



9 



■ 




>1 






II 
















«• 



II 




•A 






■M m4 




« 










« 



♦ I \mm m 



w 






l(©) = log f(X,©)jy, and therefore is asymptotically Ffficiento^^ 

No other As3miptotically normally distributed estimator can have smaller var* 

ianceo^^ If an efficient statistic exists for small samples (ioe® with min^ 

X2 

imum variance), a mle with bias correction, if necessary, will be it<> 

This follows from the fact that if there is an unbiased efficient estimate, 

13 

the maximum likelihood method will produce ito Similarly, if there is a 
sufficient statistic for estimating the true parameter value, any solution 
of the likelihood equation will be a function of ito 

Prom the preceeding summary, it can be seen why MLE has become a favored 
and often used technique in the field of statistical estimation o Although 
each of the estimation techniques has its strong points and proponents, 
(Pearson hotly defended the method of moments as "best" (44)), the mle is gen- 
erally expected to exibit more of the desirable properties of a point esti- 
mator o Still, for certain instances, depending on the situation and problem 
at hand, the use of other estimation techniques may seem more logical and/or 
be easier o In fact, for certain distributions different techniques may pro- 
duce the same estimate although generally they are different o The methods 
of moments and maximum likelihood produce the same estimates for the parame- 
ters of the normal, poisson, and binomial probability distributions 

Cramer, Mathematical Methods of Statistics, Princeton University 
Press, pp« 500-506, 1946 

'*’^Ao Mo Mood, Introduction to the Theory of Statistics, McGraw-Hill 
Book Company, Inc®, po I6O, 1950 

12 

Ro Lo Anderson and To Ac Bancroft, Statistical Theory in Research, 
McGraw-Hill Book Company, Inc«, po 102, 1952 

^^Bo Wo Lindgren, Statistical Theory, The Macmillian Company, po 226, 

I960 

^^So So Wilks, Mathematical Statistics, Princeton University Press, 

Po 146, 1943 
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2o4 Review of the Literature 



A search of literature failed to yield any new and significant informa- 
tion concerning the application of the invariance principle to MLEo Of the 
college level statistics texts reviews p 17 contained sections on point esti- 
mationo Of these, only five discussed invariance as associated with maxi- 
mum likelihood estimation, and eacn of these was restricted by the condition 

that the functional relationship be single valued or one-to-one » It is in- 
15 

teresting to note that Mood in his discussion of the property of invariance 
as applicable to MLE states that, ** o • . if i is the maximum -likelihood es- 
timate for Qg and if u(o) is any single-valued function of then u(6) is 
the maximum-likelihood estimate for u(o)o” However, in his proof of this 
property, it is implicitly assumed that an inverse function 9 = v(u) is de- 
fined and he shows that the mle for u is the value of u that maximizes 
L(v(u))o Then, in addition to the necessity of having a single-valued 
f\inction for the property to be applied as described by Mood, the inverse 
function must also exist « But even when the function is single-valued (but 
many to one, of course) there are many ways to define an inverse functions 
As shall be seen below, special care must be exercised in defining such an 
inverse. It also illustrates one of the situations motivating this investi- 
gation, namely, that discussions of the invariant property are often incom- 
plete in the above sense. 

A conspicious absence of literature concerning this property could be 
construed to indicate either that the problem is so trivial that it is un- 
necessary to record methods of application, or that the problem is of no 
practical or theoretical interest. Preliminary investigation of the problem 

^^A. M. Mood, Introduction to the Theory of Statistics, McGraw-Hill 
Book Company, Inc., p. 159» 1952 
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at hand leads to rejection of both alternatives o The SP study mentioned in 
the opening pages of this paper is just one of many indicators that the pro- 
blem is not trivial* Also, it provides a reed practical application of the 
invariant property of MLS — in an area not adequately covered by present 
concepts and definitions* 
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SECTION III 



A NEW APPROACH 



3ol The Induced Likelihood Function 

We have seen that previous applications of the inv€ 0 *iance principle to 
the method of maximum likelihood estimation have been restricted so as to pro<=> 
vide a 1-1 relationship between the domain and range of the functions of the 
parameters being estimated. Below a theorem is stated as it usually occurs 
in the 1-1 estimation problem. 

Theorem 3«1: 

If l) f;S — > E^ (read, the function f map S into E^) 

2) (f)! S ^^S*. therefore <j)“^8 S*'^^S 

3) g: S* — defined by g(©*) = 

4) there exists an element of S> denoted such that 

^ f(e) for all © in S 
then g(e*) i g(©*) for all ©* in S* 



Proof: l) Let 9 be an element of S , Then ^ (9 ) is an 

element of S and 

2 ) e(e*) = f(f^(©*)) ^ f(©^) - f(f\e*» = g(<). 

i,e, g(9*) ^ g(©*) for all 9* in S* and if 9 is 
o o 

« 

\mique, strict inequality holds and 9^ is unique. 

So far both the theorem and notation are conventional and application 
of the theorem to maLximum likelihood estimation is as follows. Let S be 
the parameter space of the estimation problem, is the real line. The 
likelihood function L(9) is such that Ls S — > and L(9} ^ L(9) for all 
9 an element of S, Suppose there exists a function (|) such that S S 
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so that (|) ^: S So Define the ’’Induced Likelihood Functionp 

M(Q ) s L(((> (© )), the likelihood function induced on S*o We now have the 

essential elements to apply theorem 3olo 
1) L: S ¥ 

20(^:S^S* ' 

3) M; S* 

4) 0 is the value of Q such L(§) ^ l( 0) for all 0 an element of S* 
Therefore if 0* « (|)(0), by theorem 3ol M(©*) ^ M(0*) for all 0* in S* and 
a mle of 0^ is (K§)* If § is unique p then 0^ is unique o 

Althou^ it becomes apparent with application, let it be emphasized at 
this point that the concept of the induced likelihood function (iLP) and 

the manner in which it is defined is a most important element of the appli« 

♦ 

cation of the theoremo A new likelihood function is defined on S and the 

* 

mle is the parameter value in S which maximizes this new functiono 

Prior to looking at situation in which the 1«1 condition is dropped, 
consider the following interesting example which emphasizes the importance 

of the definition of the new likelihood function on the transformed parameter 

* 

space S o Let all of the essential conditions of theorem 3ol hold and let 

« 

S be contained in So The theorem still applies and along with conventional 
MLE procedures produces (|)(®) as the mle of 0^o That is, the MLE procedure 

on S is carried out as usual and produces 0, a mle of 0<> However, in this 

* 

case L(0) is defined not only on S but on S as wello What happens when 

« 

the likelihood function is restricted to S ? Naturally, it is not expected 

that the restricted mle will always be the same as that produced in the un=» 

* 

restricted case since the unrestricted estimate may not be a member of S « 
However, the interesting fact is that ^(0) is not necessarily the restricted 

As 

mle even when 0 is an element of S o As an example consider the exponential 
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distribution* 

1) Let f(x;e) = 9e"®* for 9=-0 

2) L(9) = 7rf(x^;e) =e“e-^ 

l'( 9) =ne“-l e-“^ (1-xe) 




5) Let X = (t>(e) = ^ , so that 9 = (j)“^ ( ?\) = ^ 

( )X 

and g(i; A) = for 0<X< 1 

4) Let M( A ) s= L(o|o< 0<1) for the restricted estimation problem* 

Then § = ^ for x ^ 1 and is xmdefined for x < 1* 

5) However, if M(>\) = L((t) ^(A))= )i> all of the essential con- 

ditions of theorem 3ol are fulfilled* 

1) L(0): S 

2) m- s^s* 

3) M( A ) : S* * is defined by m( A ) = L((t)"^( A)) 

4) § is the value of 0 such that l(§) ^ L(0) for all 0 in S* 

Therefore by theorem 3«1 M( A ) is maximized by ^ s (|)(q) s s -j— and 
the restricted mle for x ^ l) is not equal (|)(S) = S* not been 

contained in S, the defining of the ILP would be absolutely necessary since 
L(0) would have no meaning on S * 

Taking note of the use of the 1-1 property in conventional maximum like- 
lihood estimation, it is seen that the assumption that is 1-1 is used only 
in defining m( 0 ) as a single valued function* If (j) is not 1-1, how may the 
MLE problem be handled? As before, the key concept is the characterization 
of the new likelihood function and it can be shown that, with proper defini- 
tion of the ILF, it is still maximized at (K©)* 
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Consider the case where f ; S > and (j) s S Sp that iSp the 

function is exaustive but not necessarily l-l. The ILFp the likelihood func* 
tion induced on S is defined in the following manner^ If L(§) ^ L(q) for 
all 9 an element of S, then let ©^be any value of (|)(§)o Using the Axiom of 
Choice, if necessary, define an inverse on S such that q) (9^J = 9 and for 

* * Jv-1/ *\ 

any other 9 inS,q) (9;=9 where 9 is any element of S such that 
X/ \ * J.-1 * 

(p(9) = 9 • Then (p ; S > S* Now theorem 3«1 can be extended and stated 

in a more general form. 

Theorem 5*2: 

If 1) f : S > and f(9) ^ f(e) for all 9 in S 

2) (|): S ^ S* (t>(g) = ©* 

.-1 * 

and (p : S > S defined as above 

3) g: S* > defined by g(9*) = f(())’^(9*)) 

then s(®q) - for ^ S* 

Proof: 

. * * 

1) Let 0 be an element of S 

2) g(e*) = f((t)"^(e*)) = f(9) ^ f(S) = f((t>"^(9*)) = g(e*) 

Thus, ) for all 9 an element of S 

In the estimation problem let m( 9 ) = L((|) ^(9 )), Then M(9^) ^ m( 9 ) 
so M(9*) is maximized by 9^ = (^(9), The mle of 9^ is (|)(9) just as in the 
1-1 estimation situation. The maximization of M(9 ) may not, in effect, 
have been over all the elements in S since is not onto S, but it has 

A 

taken place over the set containing 9 which is the essential factor. 

Having repeatedly emphasized the importance of the definition of M, 
the ILP, it seems reasonable at this point to acknowledge the fact that 
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there may be many ways to define and Mo In some cases the definitions 
may be such that M is not maximized at (j)(©) but this is not necessarily 
brought on by dropping the 1-1 restriction and in fact these same remarks 

apply even in the 1-1 case. In the restricted exponential estimation example 9 

* 

which was 1-1, two likelihood functions were defined on S and one was not 
maximized at <|)(©)o 

Althou^ the term “likelihood function" has been used extensively in 
theoretical statistics for quite a number of years 9 it appears that the term 
may be used rather loosely unless more emphasis is placed on the definition 
in a given problem. It is suggested that the notion of ILF may be an idea 
which will help to emphasize this point, 

3,2 Examples of the Application of Theorem 3 <>2 and the Induced Likelihood 
Function to Maximum Likelihood Estimation 
3*2,1 Geometric distribution 

1) Let f(x;e) = o ^ e ^ i 

2 ) L(e) = 



L'(e) = n(x-l)(l-e) 

0 = [l-§-(x. 



nx-n -1 



S=4 

X 



3) Let X = (|l(e) =1 



0 for 0 ^ ^ i 

I 1- © for -J- ^ © ^ 1 

(|) : [ 0 , 1 ] ^ 



4 ) Define 7\ ) = © =/ 



if X ^ 2 
if X ^ 2 

[O.i] > [0,1] 
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• • 



« 



« 

1 




5) Let M(A) = L((t)"^(>\)) 



Therefore by theorem 3o2 



I L( A ) if I i 2 
1 l( 1-A) if X i 2 



A = (K§) = 



« = = if 2 i 2 

X 

l-§ » 1- 4 if X ^ 2 
X 



6) Checking the results directly 

/N 

for x^2 M(A)=L(?\) therefore ^ = 

for X ^ 2 M( A) = L(l->A) 

M(A) = (1-A)“ [l-(l-A)] 

«• ( A ) = -n(l- A + (1- a )“ n(x-l)“-“-^ 



0 = n(l-A)“-^ A“-“-^ 



1 = x(l- A ) 



A=l-| 



[-A+ (l- A)(x-l)] 



3o2,2 Normal Distribution 



l) Let f(x;9) = for -oo< © < oo 

2 L(e) = (jV)" 2 



i(e) = (^)“ 2 - e)] 



0 = 

9 = X 

3) Let A = (j)(9) = 

(j) : > [o»co] and is not 1-1 
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I • # •) 




if 5 i 0 
if X < 0 



, L(V^ ) if 2 & 0 

5) Let M(A) = A)) ={ 

I L(-A/A ) if X < 0 

Therefore by theorem 3o2 

A = (He) = = 5^ 

6) Checking the results directly 

ifx^O, M( A ) - l(VT) = (^)“ 2 
andV^ = X /. A = 



if X < 0 , M( A ) = L(-Va ) = (^)- t e-^^ (^i +Va 

M'(A) = .(^—r 2 +Va)] 

0 = Z = dlVX 

= -X A = 

3.2.5 Binomial Distribution 

1) Let f(x;0) = 0^(1 - x = 0, 1, for 0< 9< 1 

2) L(e) = e“* (1 - 

L’(e) - nS9““^ (l - 6)“"“ -n(l - x)e“ (1-©)“"“^"^ 

= ne“"^ (1 - [x(i - ©) - 0 + ©x] 

0 = x(l-§) - 0 + §x 

S = X 
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3) Let >s = (K©) = 



29 for 0 < 0 i i 
2 - 20 for 9 < 1 
: (O, l) > (O, l) but is not 1-1 



4) Define <|)”^ ( X ) = 9 

••• (|>"^ : ( 0 , 1 ) » ( 0 , 1 ) 

5) Let H(>v) = L(<|)"^ (X)) = 



■j if ^ - T 



if I > ^ 



L(4) if 



L(^) if x>i 



Therefore by theorem 3o2 

20 = 2z if X 4 i 



2 - 20 . 2(l-x) if X > i 
6) Checking the results directly 



Ifx<^h M(^) = L(|-) = (1 - A)n(l-i) 

M'(A) = nx(-^)“-^ (1 - [x(l - -|{1 - x)] 




^ = 2x 



If X > M( A) = L(^) - (^)“ [ 1 -(^)] 

M'( A) = nx(^)“-^ [ l-(^)]“-“ [x(l- ^)-(%^)(l-x 

0 = X - (^) 

A = 2 - 2x = 2(1 - x) 
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3.3 The Multidimensional Estimation Problem 

At this point, it seems logical to consider how the theorem applies in 
the estimation problem with a multidimensional parameter space o All examples 
considered to this point have been one«*dimensionalo However p since restrict 
tions on dimensionality of the parameter space do not occur in the theorem 
or its proof, it follows that the theorem applies to the multidimensional 
estimation problem. Let £ = (9^, ® ® If £ is multidimensional, 

then so is £ and the components (9^, o , , , 9^^) etre said to be the 
joint maxim\im likelihood estimates of the corresponding 0^, 

Consider the following example of the normal distribution withX = 9^, 

2 

CT = 92 and k = 2, 



1 ) 




'2 





3) Let X. - ~ ®2^ “ 

(|) is not 1-1 
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• f 

* ‘ I if* ^ 

*- ill ^ 

f I - 




if X i 0 



4) Define = (6 , e.) = ( ^ 



5) Define H(2 i.) = L((|l"^(2i}) = / 



Therefore by theorem 3*2 






A = <1>(6) » %) = (e^ —) 



6) Checking the results directly 
if 5 i 0, H(A) = 



exp 






1-A. 



ft ='[&*- "^1 



)v -2 

\ = X »e^ 



if ■ A: [ “ - 1 <iA:) - iB-i (ii:r‘] 



^TIX) 



„ (x^-VT^) 

^ L 






1- n 



= iE (x, -x)2=s2 



c ^ Vi 

^2 = s 2 ° §2 
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3imil€irily if x ^ 0 



A = ^2) 



/-2 S -I n 

(x , — 




In some situations we may desire to estimate only a portion of the 
It should be noted that even though the estimates of only certain components 
of 0 are desired, it may be necessary to estimate the remaining parameters 
since the maximizing values for the desired set usually depend on the remain- 
ing parameters. This characteristic is demonstrated in the example just 
completed where the mle of the variance depends on the mle of the mean. 

In estimating only certain of the components of © when the remaining 
parameters are unknown, theorem 5*2 is applied. However, if some of the re- 
maining parameters are known, then the problem is quite different and the 
dimension of the parameter (estimatioi} space is reduced by one for each 

known parameter value. The problem of estimating the variance of a normal 

2 

distribution with parameters /c = ©^ and CT = ©2 serves to illustrate this 
point, 

2 

Case I : /( , CT' unknown 

/N /N 2 

We have seen that =* x and ©2 = S . In this case, the parameter 

space is two-dimensional, a half-plane. That is L : S > where S is 

\ X (0,00). 

2 

Case II : known, CT unknown P 

1 ix-U) 

In this case f(x;©) = i ©2 . Since M is known, L is 

V2ir©2 ^ ^ ^ 2 

a function of ©2 only and it is well known that “ n ^^^i ® 

this case the problem is no longer to estimate a component of a two-dimen- 
sional ©, rather we have a new one-dimensional estimation problem where S 
is a subset of E^. 
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/\ 1 ^ / _\2 

Note that Case I produced ^2 ~ n produces 

/V 1 V* / \2 

©2 ” n ^ * usually quite different resultso These differencesp 
however, are not due to the application of the theoremp but result from the 
fact that the two likelihood functions are different o 



24 



SECTION IV 



SUMMARY 



401 Summary of Findings 

The objective of this study was to investigate and formalize concepts 
and definitions that would allow the invariant property of MLE to be extend^ 
ed beyond the usually assumed 1-1 estimation situation* The induced likeli- 
hood function was introduced, and it has been shown that by properly defin- 
ing the ILF, theorem 3«2 provides the tool for applying the invariance prin- 
ciple in the estimation problem with a transformation which is not l-lo 
The theorem was shown to be equally applicable in the 1 or k dimension esti- 
mation situation* 

In the development of theorem 3o2 it has been strongly emphasized that 

the power of the technique lies in the defining of the new likelihood func= 

♦ 

tion, the likelihood function induced on S * It is felt that, in the past, 
not enough emphasis has been focused on this induced likelihood function* 

402 Proposed Areas for Further Study 

This study has not attempted to investigate the distribution theory re= 
lated to the mle*s A = (p(9) derived using the ILF* Certainly, it is im- 
portant to know if present mle distribution theory is still applicable in 
the unrestricted estimation situation* Therefore, it is suggested that an 
area which presents fertile ground for study is mle distribution theory in 
the new situations covered in this study* 

The examples presented in this investigation are simple and are in- 
tended merely to acquaint the reader with the proposed use of the theorem 
and the ILF. It is hoped that this study has generated reader interest 
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which will result in application of the induced likelihood function and 
the associated theorem in a wide variety of estimation situations c 
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APPENDEX ONE 



Symbol 


SYMBOLS AND ABBREVIATIONS 

♦ 

Definition Pag© 


mle 


Maximum likelihood estimate 1 


1-1 


one-to-one 1 


MLE 


maximum likelihood estimation 1 


X. 

1 


observed value of random variable X. 3 

1 


Q. 

1 


index for i^^ parameter 3 


f(x;e) 


probability density function 3 


pdf 


probability density function 3 


E(x) 


the expectation of x 3 


(x^9 ^2* ® ® o 

f(9 1 i) 


, x^) a sample or observed outcome 3 

the conditional pdf of 0 given Xs=x 5 


L(9) 


the likelihood function 5 


Xg, . . 


0 , x^) an estimator 6 


f; S > B 


the function f is such that it maps 13 

S into E 




the real linep Euclidean 1-space 13 


S^S* 


the function (j) is such that it maps 13 

S onto S (onto implies ”exaustive”) 
and is 1-1 o 


H( ) 


the induced likelihood function 14 


lU 


the induced likelihood function 14 


[0. 1] 


the closed interval 0, 1 17 


[o. l) 


the half-closed interval Op 1 18 


[o. l) 


the open interval 0, 1 18 


£ 


^1» 0 o o 9 20 



Page on which symbol originally was introduced 
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